The Prediction of Bacterial Transcription Start Sites Using Svms
نویسندگان
چکیده
Identifying promoters is the key to understanding gene expression in bacteria. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). In this paper, we address the problem of predicting transcription start sites in Escherichia coli. Knowing the TSS position, one can then predict the promoter position to within a few base pairs, and vice versa. The accepted method for promoter prediction is to use a pair of position weight matrices (PWMs), which define conserved motifs at the sigma-factor binding site. However this method is known to result in a large number of false positive predictions, thereby limiting its usefulness to the experimental biologist. We adopt an alternative approach based on the Support Vector Machine (SVM) using a modified mismatch spectrum kernel. Our modifications involve tagging the motifs with their location, and selectively pruning the feature set. We quantify the performance of several SVM models and a PWM model using a performance metric of area under the detection-error tradeoff (DET) curve. SVM models are shown to outperform the PWM on a biologically realistic TSS prediction task. We also describe a more broadly applicable peak scoring technique which reduces the number of false positive predictions, greatly enhancing the utility of our results.
منابع مشابه
Improved prediction of bacterial transcription start sites
MOTIVATION Identifying bacterial promoters is an important step towards understanding gene regulation. In this paper, we address the problem of predicting the location of promoters and their transcription start sites (TSSs) in Escherichia coli. The accepted method for this problem is to use position weight matrices (PWMs), which define conserved motifs at the sigma-factor binding site. However ...
متن کاملSVM Based Prediction of Bacterial Transcription Start Sites
Identifying bacterial promoters is the key to understanding gene expression. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). Knowing the TSS position, one can predict promoter positions to within a few base pairs, and vice versa. As a route to promoter identification, we formally address the problem of TSS prediction, drawing on the RegulonDB datab...
متن کاملMycobacterium avium subsp. paratuberculosis induces differential cytosine methylation at miR-21 transcription start site region
Mycobacterium aviumsubspecies paratuberculosis (MAP), as an obligate intracellular bacterium, causes paratuberculosis (Johne’s disease) in ruminants. Plus, MAP has consistently been isolated from Crohn’s disease (CD) lesions in humans; a notion implying possible direct causative ...
متن کاملPREDetector: a new tool to identify regulatory elements in bacterial genomes.
In the post-genomic area, the prediction of transcription factor regulons by position weight matrix-based programmes is a powerful approach to decipher biological pathways and to modelize regulatory networks in bacteria. The main difficulty once a regulon prediction is available is to estimate its reliability prior to start expensive experimental validations and therefore trying to find a way h...
متن کاملBacterial start site prediction.
With the growing number of completely sequenced bacterial genes, accurate gene prediction in bacterial genomes remains an important problem. Although the existing tools predict genes in bacterial genomes with high overall accuracy, their ability to pinpoint the translation start site remains unsatisfactory. In this paper, we present a novel approach to bacterial start site prediction that takes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- International journal of neural systems
دوره 16 5 شماره
صفحات -
تاریخ انتشار 2006